Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: TunnelCommunity multiprocessing #2607

Closed
wants to merge 6 commits into from

Conversation

qstokkink
Copy link
Contributor

@qstokkink qstokkink commented Nov 1, 2016

Related to #1882 .

This PR adds a multiprocessing option for the TunnelCommunity by using Twisted's reactor.spawnProcess.

Architecture

The architecture of this implementation consists of two halves:

  1. The main process's view
  2. The subprocess's view

The following subsections will cover these two views. After that the generic process interface will be discussed.

Main process view

classtree_full_parent
Using the pooled option in the settings the main process can decide to substitute its TunnelCommunity (/HiddenCommunity) for a PooledTunnelCommunity. This PooledTunnelCommunity offers the same interface as the normal TunnelCommunity. The big difference of course, is that it offloads (1) circuit creation (create_circuit), (2) sending LibTorrent data (send_data) and (3) retrieving download info/peers (monitor_downloads) to other (child) processes.

This offloading to child processes is handled through the ProcessManager class, leaving the PooledTunnelCommunity solely in charge of decorating the TunnelCommunity interface. This keeps the code clean and focused.

The wrapper for the child processes which the ProcessManager uses, is the TunnelProcess class (processes/tunnel_childprocess.py). The TunnelProcess takes care of the high-level interfacing with the TunnelSubprocess class running on the actual child process (more on that in the next subsection).
The low-level twisted Protocol interfacing is handled by the TunnelProcess's parent, the ChildProcess.

Subprocess view

classtree_subprocess_fuller
The TunnelSubprocess is in charge of running a TunnelCommunity which supports RPC calls from the main process. This is achieved (in line with the existing tunnel_helper code) by running a very light-weight Session. The low-level interfacing with the main process is handled by the TunnelSubprocess's parent class, the Subprocess class.

Note that the TunnelSubprocess is in charge of all encryption/decryption and management of circuit and hop constructs. The TunnelSubprocess will share its circuits and hops with the main process, but it is the only process allowed to edit these.

The IProcess

To minimize duplicate code, both the TunnelProcess and TunnelSubprocess inherit from the IProcess interface. This interface is implemented on the ChildProcess by means of the twisted ProcessProtocol.transport and on the Subprocess using os file descriptors and raw threads. These raw threads on the Subprocess are delegated to the twisted threadpool a.s.a.p. to avoid threading issues. Building on the IProcess is the RPCProcess which provides generic serialization for RPC calls.

The generic RPC interface, in turn, allows for object synchronization using the SyncDict class. This class is a subclass of the dict class, which checks for updates to its entries (RemoteObjects) and sends out minimized update messages/diffs to synchronize the main process's view of the shared objects.

Data flows

There are three data flows between the main process and a child process. These consist of:

  1. Control data
  2. Raw data
  3. Exit messages

These data flows are implemented using separate process file descriptors (so, next to stdin, stdout and stderr). Note that this leaves the stdout and stderr open for debugging purposes.
The rationale for each of these streams will be discussed in the following subsections; each of these different flows has radically different data characteristics, which would otherwise interfere.

Control flow

The control flow handles messages with high diversity and low volume. This comes down to the main process requesting community setup, circuit creation and download monitoring and the child process forwarding notifications, SyncDict changes and circuit deaths.

Raw data flow

The raw data flow deals with no diversity and high volume messages. Its only purpose is to serialize and deserialize LibTorrent data as fast as possible.

Exit messages

The exit messages have a low latency requirement. If these messages were to have to wait for both data and control messages this would add unnecessary delay to Tribler shutdowns. This turned out to be especially important in a previous Proof of Concept, in which a bug clogged the pipe and the process could not exit (non-forcefully).

Reviewing this PR

The commit history of this PR has been fabricated in such a way that:

  1. The first commit handles TunnelCommunity and HiddenCommunity refactoring, such that they utilize SyncDicts and allow for control message propagation to an unknown process entity, should it exist (which it doesn't at this point).
  2. The second commit adds the process views as discussed in the architecture section
  3. The third commit adds the intergration into the Tribler options/defaults
  4. The fourth commit adds all of the unit tests

@qstokkink qstokkink changed the title [WIP] TunnelCommunity multiprocessing WIP: TunnelCommunity multiprocessing Nov 1, 2016
@qstokkink qstokkink force-pushed the tunnelprocess_rc2 branch 5 times, most recently from ce63494 to 3025cab Compare November 1, 2016 13:25
@qstokkink
Copy link
Contributor Author

Ok.. within the same timestamp a 1.9MB download goes from 0.0 to 1.0 after checkpointing. @devos50 is this the infamous checkpointing bug? (Or is this my error?)

DEBUG   1478006948 LibtorrentDownloadImpl:LibtorrentDownloadImpl:490  852ffeabfd7c56d102721fe4ded93ada2bb714ea get resume data {'active_time': 0, 'num_incomplete': 16777215, 'announce_to_lsd': 1, 'seed_mode': 0, 'pieces': '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'paused': 0, 'seeding_time': 0, 'last_scrape': 0, 'info-hash': '\x85/\xfe\xab\xfd|V\xd1\x02r\x1f\xe4\xde\xd9:\xda+\xb7\x14\xea', 'upload_rate_limit': 0, 'max_uploads': 16777215, 'max_connections': 16777215, 'unfinished': [], 'num_downloaded': 16777215, 'total_downloaded': 0, 'priority': 0, 'file-format': 'libtorrent resume file', 'peers6': '', 'added_time': 1478006947, 'banned_peers6': '', 'file_priority': [1], 'last_seen_complete': 0, 'total_uploaded': 0, 'piece_priority': '\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01', 'file-version': 1, 'announce_to_dht': 1, 'auto_managed': 0, 'save_path': '/home/jenkins/workspace/GH_Tribler_PR_tests_linux/tribler/Tribler/Test/data', 'peers': '', 'completed_time': 0, 'blocks per piece': 2, 'download_rate_limit': 0, 'libtorrent-version': '1.0.7.0', 'banned_peers': '', 'sequential_download': 0, 'announce_to_trackers': 1, 'num_complete': 16777215, 'finished_time': 0, 'last_upload': 0, 'trackers': [['http://localhost/announce']], 'super_seeding': 0, 'last_download': 0}
DEBUG   1478006948 LibtorrentDownloadImpl:LibtorrentDownloadImpl:496  tlm: network checkpointing: to file /home/jenkins/workspace/GH_Tribler_PR_tests_linux/tmp/tmpnCpnNV_tribler_test_session/dot.Tribler2/dlcheckpoints/852ffeabfd7c56d102721fe4ded93ada2bb714ea.state
DEBUG   1478006948 LibtorrentDownloadImpl:LibtorrentDownloadImpl:822  Torrent video.avi PROGRESS 1.0 DLSTATE 4 SEEDTIME 1

@devos50
Copy link
Contributor

devos50 commented Nov 1, 2016

@qstokkink no I can't explain that error unfortunately.

@qstokkink
Copy link
Contributor Author

This is an actual bug in my code. Confirmed to fail on localhost reliably for this branch and pass reliably on devel.

@qstokkink
Copy link
Contributor Author

qstokkink commented Nov 2, 2016

Alright, the bug is fixed by replacing a line of code I thought was duplicate (which it is, normally).

The cause is that the LaunchManyCore adds its own tunnel_community to itself (as it should), but the tests disable this and inject their own community and have the tunnel_community fix the reference in the Session for them.

We should probably look at refactoring the following line of code into the tests, where it belongs:
https://github.com/Tribler/tribler/blob/devel/Tribler/community/tunnel/tunnel_community.py#L321

@qstokkink qstokkink changed the title WIP: TunnelCommunity multiprocessing READY: TunnelCommunity multiprocessing Nov 2, 2016
@qstokkink
Copy link
Contributor Author

@devos50 ready for whenever you feel masochistic enough to review

@devos50
Copy link
Contributor

devos50 commented Nov 2, 2016

@qstokkink there is much code coverage to gain. When will you write the unit tests for these uncovered lines?

@devos50
Copy link
Contributor

devos50 commented Nov 2, 2016

@qstokkink I also see many pylint errors that are easy to fix

@qstokkink
Copy link
Contributor Author

qstokkink commented Nov 2, 2016

@devos50 As far as code coverage goes, I can write unit tests for the following classes somewhat easily:

  • Tribler/community/tunnel/processes/line_util.py
  • Tribler/community/tunnel/processes/rpcprocess.py
  • Tribler/community/tunnel/remotes/remote_object.py
  • Tribler/community/tunnel/remotes/sync_dict.py

And I really should, you are correct: I'll mark this as WIP until I have implemented these. The rest would really make more sense being covered by an integration test.

As far as the pylint errors go, I believe fixing them all would be very much possible and very much treating the metric: I am prepared to defend each of these violations.

@qstokkink qstokkink changed the title READY: TunnelCommunity multiprocessing WIP: TunnelCommunity multiprocessing Nov 2, 2016
@qstokkink
Copy link
Contributor Author

Alright, unit tests are now covering all classes which are not:

  1. A data class
  2. In charge of spawning processes
  3. Non-functional without an actual process

I think these would be better suited in a Gumby integration test, which will follow later.

Note that inclusion of category 2 would mean having to come in with root access once in a while to clean up orphaned processes. Category 3's would be possible using a really fat testing framework, but then the unit tests (a) become small integration tests, (b) would also require the classes themselves to be refactored to allow for testing and (c) only serve to bring coverage up to 100% for trivial lines, while advanced functionality is already covered by the other unit tests.

For now I will mark this as READY again, feel free to challenge me on any pylint violation or class assignment to the integration tests.

@qstokkink qstokkink changed the title WIP: TunnelCommunity multiprocessing READY: TunnelCommunity multiprocessing Nov 3, 2016
@devos50
Copy link
Contributor

devos50 commented Nov 8, 2016

@qstokkink I started reviewing, however, it might take a few days because I'm also busy with wrapping up the first alpha release of Tribler 7.

Regarding the tests: the coverage of you code is quite low and we want more coverage by unit tests. While at first glance, it seems that writing tests for various classes leads to integration tests, it is often not very hard to create small self-contained unit tests using mocking. Take for instance the ProcessManager class (https://github.com/qstokkink/tribler/blob/tunnelprocess_rc2/Tribler/community/tunnel/processes/processmanager.py). Using mocking, it's not very hard to verify the correct behaviour of methods in this class.

I don't expect that you cover all possible lines (since some are almost impossible to cover), however, the coverage of your code by the means of unit tests must be higher. The lack of unit tests is something that has plagued Tribler for many years and has lead to many regression bugs. Please try to keep these unit tests as small as possible and make sure they are only testing the thing that you actually want to test.

You can take a look at the test suite in Tribler, especially the Core tests. There are various classes which are tested using mocking. If you have any questions, please let me know!

@qstokkink
Copy link
Contributor Author

@devos50 I hate having to implement a factory pattern/adding complexity just for testing purposes. Your point about regression bugs is very valid though, so I will (attempt to) squeeze out some more tests.

@devos50
Copy link
Contributor

devos50 commented Nov 16, 2016

@qstokkink I've tried various torrents on MacOS (Ubuntu, Pioneer, Big Buck Bunny). Some of them are starting the download and others are not. When the download is starting, the subprocesses seems to work correctly 👍

I don't see significant improvements to the download speed, however, I don't download much anonymously.

22317dd4b03576499e88ac291271d71b

Will test on Windows next

@devos50
Copy link
Contributor

devos50 commented Nov 16, 2016

Also, review is coming!

@qstokkink
Copy link
Contributor Author

@devos50 Thanks for the efforts! This last week-and-a-half I have been doing thesis writing, so no real progress on the additional tests: although I have started working on the Gumby experiment.

Also, I guess it's at least good to see that the CPU usage is under control, even though the download speed hasn't improved.

Copy link
Contributor

@devos50 devos50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed your work. Overall, it's quite impressive. I see some advanced data structures and methods being used (SyncDict, RPC) and it's interesting to see how they are working together.

There is still a bit confusion left on my side on which processes there are exactly so maybe you can make this more clear (at least in your thesis)? It would also be nice if you can write a page on our readthedocs (http://tribler.readthedocs.io/en/latest/) about the structure of the new tunnel community so other users can read about it, however, this is probably something for after this PR.

I'm very positive that the code can be merged before the release of Tribler 7 and that we can ship it as an experimental feature that can be enabled in the settings pane (this means that it will be disabled by default unless @synctext has other ideas about this). I haven't tested on Windows yet since I'm unable to get the old GUI working on it so if you have any Windows computer around, please try it out to see whether it works.

Moreover, before the Tribler 7 release, we should test how well the code behaves in a frozen environment (after baking the final .dmg/.exe file that is distributed) since there are some minor quirks in such environments regarding multiprocessing.

There are still many Pylint issues open that are very easy to fix. Also, try to improve the code coverage by writing small unit tests.

If you comment my feedback, please use fixup commits so I don't have to review everything again after you have made changes :)

self.tunnel_community = self.dispersy.define_auto_load(
HiddenTunnelCommunityMultichain, dispersy_member, load=True, kargs=tunnel_kwargs)[0]
if self.session.get_tunnel_community_pooled():
if self.session.get_tunnel_community_test_pooled():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add test-dependent code to our Core code base. Instead, you can load this test community in the test(s) where you need this.

self.tunnel_community = self.dispersy.define_auto_load(
PooledTunnelCommunity, dispersy_member, load=True, kargs=tunnel_kwargs)[0]
else:
if self.session.get_tunnel_community_test_pooled():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above (no test-dependent code in our Core)

"""
return self.sessconfig.get(u'tunnel_community', u'pooled')

def set_tunnel_community_test_pooled(self, value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment about test-related code in the core code base.

@@ -912,6 +912,15 @@ def run(params=[""], autoload_discovery=True, use_torrent_search=True, use_chann
params = get_unicode_sys_argv()[1:]
else:
params = sys.argv[1:]

if "--tunnel_subprocess" in params:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add this option to the twistd plugin, both to the Tribler runner (tribler_plugin.py) and the tunnel helper plugin (tunnel_helper_plugin.py).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that tribler_main.py will be removed in a while. I guess this code is quite important for the functioning of the pooled community?

Also, is it possible to have this code in a separate file?

self.ctrl_written += s

def clear_callbacks(self):
"""Deal with the aftermath of calling send_rpc without a response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have the """ and the text on separate lines (that's more in line with our used style of documentation)

# Configure MiniSession
config.set_state_dir(working_dir)
config.set_torrent_checking(False)
if not test_mode:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test-related code.

:return: RPC response code
:rtype: str
"""
if not self.session_started:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these kind of events be logged?

required_endpoint,
info_hash)

def on_data(self, s):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename this parameter to be more meaningful

@@ -1401,3 +1427,22 @@ def increase_bytes_received(self, obj, num_bytes):
else:
raise TypeError("Increase_bytes_received() was called with an object that is not a Circuit, " +
"RelayRoute or TunnelExitSocket")

def set_process(self, process):
"""Set the TUnnelSubprocess we belong to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra capital letter in the word TunnelSubprocess

circuit.unverified_hop = None

if circuit.state == CIRCUIT_STATE_EXTENDING:
ignore_candidates = [self.crypto.key_to_bin(hop.public_key) for hop in circuit.hops] + \
ignore_candidates = [self.hops[h].node_public_key for h in circuit.hops] + \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please name the h variable better.

@devos50 devos50 changed the title READY: TunnelCommunity multiprocessing WIP: TunnelCommunity multiprocessing Nov 19, 2016
@qstokkink
Copy link
Contributor Author

@devos50 Thanks for the review!

I have one rebuttal, for several of the comments on the names in the pooled_tunnel_community: the names in this class have been adopted to provide the same interface as the TunnelCommunity. If I were to change these particular method names, I would have to refactor the TunnelCommunity and the rest of Tribler.
The "do_circuits" LoopingCallback name is an exception, but this has also been adopted from the TunnelCommunity for clarity.

This also ties in to the TODO's for duplicate code: I was planning to give the TunnelCommunity and the PooledTunnelCommunity a shared interface/abstract parent class for structural clarity. However, I deemed this too complicated for this PR and left it as a TODO for the future.

@qstokkink
Copy link
Contributor Author

I just merged in the new GUI and it was kind of catastrophic.
A plethora of things is going wrong, including the subprocesses not exiting anymore.
For some reason downloading still works though:
newgui
At any rate, I probably wont have this done tomorrow.

@qstokkink qstokkink force-pushed the tunnelprocess_rc2 branch 5 times, most recently from f8e0092 to ea20c6c Compare February 12, 2017 11:20
@qstokkink
Copy link
Contributor Author

Welp. I finally got it working on Windows.

Turns out Windows has special binary flags for the stdio streams.

Fix incoming in a little bit.

@qstokkink qstokkink force-pushed the tunnelprocess_rc2 branch 2 times, most recently from 7dd823b to 431c065 Compare February 25, 2017 14:53
@qstokkink
Copy link
Contributor Author

@devos50 This is confirmed working (again) on Windows and Linux. What do you want to do with this PR? (rebasing this is going to be a little bit of a bitch, so I'll hold off on that until this gets greenlighted for merging)

@synctext
Copy link
Member

congrats on the Windows victory!

@devos50
Copy link
Contributor

devos50 commented Feb 25, 2017

@qstokkink good to hear! Why not rebase now? It has to be done anyways.

@synctext do we want to ship this as experimental feature in the upcoming Tribler 7 release or should we wait with merging until we released v7?

@synctext
Copy link
Member

If this is stable and mature, then it's great stuff to merge. Quality over Tribler feature set..

@qstokkink
Copy link
Contributor Author

Alright, history has been rewritten, pylint errors have been minimized (I can't get rid of the remaining ones or everything breaks 😃 ): as far as I can see this is ready for review again.

@qstokkink qstokkink changed the title WIP: TunnelCommunity multiprocessing READY: TunnelCommunity multiprocessing Feb 26, 2017
@qstokkink qstokkink force-pushed the tunnelprocess_rc2 branch 2 times, most recently from 496c5a1 to 0ca5155 Compare March 2, 2017 11:06
@qstokkink qstokkink force-pushed the tunnelprocess_rc2 branch from 0ca5155 to f7ca6ca Compare March 2, 2017 12:23
@qstokkink
Copy link
Contributor Author

qstokkink commented Mar 2, 2017

Changing this to WIP to implement some things on my wishlist for this PR:

  • Use os.environ instead of sys.argv
  • Use factory pattern for process spawning to make testing easier

@qstokkink qstokkink changed the title READY: TunnelCommunity multiprocessing WIP: TunnelCommunity multiprocessing Mar 8, 2017
@qstokkink qstokkink force-pushed the tunnelprocess_rc2 branch from 102000b to 8157422 Compare March 8, 2017 13:47
@qstokkink
Copy link
Contributor Author

With the advent of IPv8 and separating the GUI from the core, 90% of this PR is no longer required. We can filter out the useful stuff later, if/when we need it.

@qstokkink qstokkink closed this Jan 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants